Wrapper Veriication

نویسنده

  • Nicholas Kushmerick
چکیده

Many Internet information-management applications (e.g., information integration systems) require a library of wrappers, specialized information extraction procedures that translate a source's native format into a structured representation suitable for further application-speci c processing. Maintaining wrappers is tedious and error-prone, because the formatting regularities on which wrappers rely change frequently on the decentralized and dynamic Internet. The wrapper veri cation problem is to determine whether a wrapper is operating correctly. Standard regression testing approaches are inappropriate, because both the formatting regularities on which wrappers rely, and the source's underlying content, may change. We introduce rapture, a fully-implemented, domain-independent wrapper veri cation algorithm. rapture computes a probabilistic similarity measure between a wrapper's expected and observed output, where similarity is de ned in terms of simple numeric features (e.g., the length, or the fraction of punctuation characters) of the extracted strings. Experiments with numerous actual Internet sources demostrate that rapture performs substantially better than standard regression testing. ii Nicholas Kushmerick, Wrapper veri cation 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression

Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...

متن کامل

Fuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection

Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...

متن کامل

Step: Deductive-algorithmic Veriication of Reactive and Real-time Systems ?

The Stanford Temporal Prover, STeP, combines deductive methods with algorithmic techniques to verify linear-time temporal logic speciications of reactive and real-time systems. STeP uses veriication rules, veriication diagrams, automatically generated invariants, model checking, and a collection of decision procedures to verify nite-and innnite-state systems. computer-aided formal veriication o...

متن کامل

Ethernet Wrapper: Extension of the TCP Wrapper

One of the popular network security programs supporting host access control is the ’TCP Wrapper’ [13]. TCP Wrapper is a software–only system and many computers connected to the Internet are using it. But, TCP Wrapper does ’IP address–based’ access control. IP address is not such a reliable source when authenticating a host. In this paper, we point out two possible attacks against the TCP Wrappe...

متن کامل

Interface-based Speciication and Veriication of Concurrency Controllers

We present a modular approach to speciication and veriication of concurrency controllers by decou-pling their behavior and interface speciications. The behavior speciication of a concurrency controller deenes how its shared variables change their values whereas the interface speciication deenes the order in which a client thread should call its methods. We show that the concurrency controllers ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000